Estimating errors reliably in Monte Carlo simulations of the Ehrenfest model 
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Using the Ehrenfest urn model we illustrate the subtleties of error estimation in Monte Carlo 
simulations. We discuss how the smooth results of correlated sampling in Markov chains can fool 
one's perception of the accuracy of the data, and show (via numerical and analytical methods) how 
to obtain reliable error estimates from correlated samples. 

PACS numbers: 



I. INTRODUCTION AND SUMMARY 

The Ehrenfest urn model [1], is sometimes pic- 
turesquely described as fleas jumping between dogs. One 
imagines a sub-system of N numbered fleas residing on 
dog A or dog B, each jumping from one dog to the other 
when its number is called. This model has been pre- 
viously used 0, m, including in this journal [3], to il- 
luminate thermodynamic equilibration and equilibrium. 
Monte Carlo simulations of the process are particularly 
instructive. One of us noticed but let pass with- 
out investigation, the errors associated with such calcu- 
lations. This neglect is remedied in the present paper, 
raising issues known to specialists but perhaps not widely 
enough appreciated. The tutorial exposition given here 
may therefore be of general interest. 

It is worth stressing that, although the message of 
this paper is that single flea hops arc an inefficient way 
to sample the steady-state, the process is ideally suited 
to understanding thermodynamically irreversible transi- 
tions from unlikely to likely configurations 0, d, 0, [1], 
as well as fluctuations in equilibrium, which in typical 
physical situations also proceed in small steps. 

The paper is organized as follows. Section 2 contains 
a brief description of the essence of the Monte Carlo 
method. Although the procedure is useful in cases where 
an enumeration of possibilities is prohibitively difficult, 
the urn model is simple enough to allow explicit analysis. 
In the main body of the paper we exploit only the fact 
that the steady-state probability of n fleas on dog A is 
a binomial distribution, and use this as a check for var- 
ious numerical simulations. In section 3, we show that 
trials of A^-flea configurations yield good results with ex- 
pected errors. We then simulate the single flea transfer 
used in refs.[l, d, [E] and encounter the apparent inac- 
curacies mentioned above. In section 4, correlations be- 
tween successive samples and their effect in reducing the 
number of independent trials is studied, and a numeri- 
cal method ( "binning analysis" ) is used to illuminate and 
eliminate the problem, leading to the conclusions of Sec- 
tion 5. In an Appendix, the Markov (i.e. memoryless) 
random process underlying single flea transfers is treated 
analytically, using methods similar to those in ref . Q , re- 



vealing nice features of the approach to equilibrium and 
the autocorrelation problem. 



II. THE MONTE CARLO METHOD 

It is told that Stanislav Ulam @] invented the Monte 
Carlo method in the 1940s when playing Solitaire while 
lying sick in bed. He wanted to know the probability 
of winning in Solitaire but was faced with the problem 
that with 52! « 10^^ different ways of arranging the cards 
he could never exactly calculate the chance of winning. 
He realized, however, that by just playing 100 games and 
counting the number of wins he could already get a pretty 
good estimate. 

This insight suggested a way of tackling the problem 
caused by the exponential growth with size in the number 
of states of a statistical system. In a general statistical 
context, one might wish to calculate weighted averages 
over configurations. However, even in our very simple 
model the number of ways of distributing fleas between 
dogs is 2^. These configurations may be enumerated by 
2^ A-dimensional vectors x of which each element .t„, 
\ < n < N , can take on two values. If each configuration 
is assigned a normalized weight p{x), '^gpix) = 1, the 
weighted mean of an arbitrary function of the configura- 
tion, A{x) is 



(1) 



An exact summation over all states is, in general, im- 
possible for TV > 40, even on the most powerful super- 
computers. The Monte Carlo method [7|, which Ulam 
named after the famous casinos in Monaco tries to 
estimate such sums by a partial sum over a sample of 
only M <C 2^ configurations Xi 



A = 



1 

-Ya 



(2) 



where the configurations chosen randomly with 

the correct probability vix), and we have introduced the 
shorthand notation Ai = A{xi). 
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Choosing the sample randomly and with the correct 
probabilities is as crucial here as in opinion polls before 
presidential elections: only a truly random and represen- 
tative sample will give meaningful results. 

The estimate A of the true expectation value (A) is a 
fluctuating quantity that will deviate from the true value. 
According to the central limit theorem, A is normally 
distributed around (A) with a standard deviation A^i 
that we shall calculate below. 

As a warmup let us show that the expectation value of 
A is indeed (A): 



_ 1 ^ 

i=l 
M 

i=l 

1 

i=l 

In going from the first to the second line we have used 
linearity of the expectation value; going from the sec- 
ond to the third line we have made use of the fact that 
the samples all chosen from the same distribution 

p{x), so that the ^,;S have the expectation value given by 
Eq. ^. 

Similar reasoning allows the calculation of the average 
of the square of the sample mean. 



which is the basis of the central limit theorem. It is, 
however, more useful to express the error in terms of the 
sampled ^^s. A naive guess would be to estimate the 
where 
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variance as — A 



1 

M ^ 



(7) 



Calculating the expectation values via Eq. ^ shows that 

(8) 

The true estimator is thus 
Var^ 



2^ ^ M 

M 



M - 1 



(9) 



where the (small) fluctuations of the right hand side of 
Eq. Q have been ignored. Taking the square root, we 
obtain the final result 



A. = 



VarA 
M 



-A^ 



M -\ 



(10) 



The —1 in the denominator, which is of course irrelevant 
for the large values of M in the numerical simulations 
below, reflects the loss of one piece of information in cal- 
culating the sample mean. 
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(4) 



where we have inserted the definition of the average ([2|. 
used the linearity of the expectation value, and also ex- 
ploited the fact that for independent samples Xi and Xj 
the expectation value for i ^ j factorizes as 



{A,A,) = {A.){A,) = {A)' 



(5) 



The statistical error Ayi, the root-mean-square deviation 
of the sample mean A from the true expectation value 
(A), is thus given by 



M 



M 
1 

M 
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VarA, 



(6) 



III. DOGS AND FLEAS 

After these preliminaries, let us consider the fleas on 
two dogs game as played in references [1, [H| • The game 
starts with two dogs - a flea-ridden dog B(urnside) with 
iV = 50 fleas and a clean dog A(nik). Once per time step 
a randomly chosen flea hops from one dog to the other, 
so that asymptotically the probability of a flea being on 
one of the dogs is 1/2. In this simple case, it is possible 
to analytically calculate the probability distribution P[rL\ 
for having n of the N fleas on one dog. It is the binomial 
distribution 



1 



iV! 



^■^'f"^" 2^ \ n )- 2^n\{N-n)\ 



(11) 



This exact solution will be very useful as a test for our 
Monte Carlo simulations. 



A. Direct Sampling 

Our first Monte Carlo simulation will not yet follow 
the above game, but will directly sample the asymptotic 
distribution. For each sample, we loop over all fleas and 
draw a uniformly distributed random binary integer u € 
{0, 1}. If M = the flea is positioned on Anik, otherwise 
on Burnside. In order to estimate the distribution P[ri\ 
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FIG. 1: Comparison of the flea distribution P[n] obtained in 
a direct Monte Carlo simulation with the exact asymptotic 
result. M — 10, 000 samples were recorded. 



for the number of fleas n on Anik it will be sufficient 
to record a histogram H[n] counting how often n fleas 
ended up on her. From this histogram we can compute 
an estimate for P[n] as 



P[n] 



1 ■ H[n\ + ■ {M ~ H[n]) H[n] 



M 



M 



(12) 



since our estimator is 1 whenever there were n fleas on 
Anik and otherwise. Since 1^ = 1 and 0^ = we get 
the same estimator for the square 



_ 1' ■ H[n] + 0^ ■ (A/ - H[n]) _ H[n] 
from which wc obtain the error estimate 



A 



P\n] 



H[n]/M - H[nY/NP 
M - 1 



(13) 



(14) 



In Fig. [l]we compare the exact solution to the Monte 
Carlo solution for M = 10, 000 samples and find that, as 
expected from the normal distribution, the exact solution 
lies within error bars about 2/3 of the time. The Monte 
Carlo simulation is working well! 



B. The Dogs and Fleas Simulation 

Next we want to implement the simulation of the dog 
and fleas game. Here we will repeat these simulations, 
observe discrepancies, and explain their origin. 

As introduced above, we start with all iV = 50 fleas on 
Burnside and hence n = 0. In each simulation step we 
will then pick one of the N fleas at random, by drawing a 
uniform integer random number u between 1 and N and 



FIG. 2: Comparison of the flea distribution P[n] obtained in 
Monte Carlo simulations of the original dog and fleas game 
with the exact asymptotic result. Two different random seeds 
were used, M = 10,000 and Al/5 steps were used for equli- 
bration. Something is obviously wrong since the exact results 
are significantly outside the error bars, not even the two sim- 
ulations agree, and the asymmetric shape cannot be right. 



move that flea to the other dog. In practice we label the 
fleas so that the fleas 1 , . . . , n are on Anik and the fleas 
n + 1, . . . , N on Burnside. Hence if m < n we move a flea 
from Anik to Burnside and decrease n by one, otherwise 
we move a flea in the opposite way and increase by 1. 

In our simulation we need to wait a while until the 
fleas have equilibrated and we can expect to observe the 
asymptotic distribution. We thus perform M/5 flea hops 
for equilibration, without recording any measurements. 
Only then do we start with the actual simulation and 
perform M flea hops, recording a histogram H[n\. 

In simple examples like this simulation we might ac- 
tually be able to guess the number of steps needed for 
equilibration. As we show in the appendix, only about 
50 hops are needed to reach equilibrium. Why then did 
we throw away 20%, or 2,000 samples? The reason is that 
in more complex cases we often have no idea of the actual 
equilibration times. It is then strongly recommended to 
err on the side of throwing away too many samples rather 
than too few. By throwing away the first 20% of our sam- 
ples we increase our statistical error by only about 10% 
(remember the inverse square root scaling of the error 
with the number of samples), which is a small price to 
pay to be on the safe side regarding equilibration. 

In Fig. [2] we again compare to the exact solution and 
observe deviations remarked on before [1]. At first sight, 
the deviations are puzzling, since the curves look smooth. 
However, the asymmetric shapes cannot be correct, and 
the errors bars, calculated using Eq. (fTO|) with M = 8000 
are evidently too small. That these features are general 
can be seen by repeating the simulations with different 
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random seeds: sometimes the results look mostly right, 
but often they are just plainly wrong as in Fig. [21 The 
large variations observed also confirm that something is 
wrong with the error estimates. 

A little further thought suggests the reason. Eq. (fTUl) 
is an estimate for the relative deviation from the mean of 
M trials of a binomial process with a success probability 
estimate P[n\. But, M single flea hops is not the same as 
M trials of the whole distribution as performed to obtain 
Fig. [11 



IV. AUTOCORRELATION EFFECTS AND 
ERROR ESTIMATES 

We need to reconsider the derivation of the errors in 
equations ^ to (fTP)) . The only assumption, besides a 
finite variance, was in Eq. ([5]): the independence of sam- 
ples Xi and Xj for i ^ j. While this independence was 
clearly given in the direct simulation — at least as long 
as we use independent random numbers to create the 
flea distributions — it is not true of the original dogs 
and fleas simulation, in which subsequent samples dif- 
fer only by a single random process. They form what is 
called a "Markov chain." As just remarked, this method 
of sampling evidently explores the space of states much 
less efficiently than the calculation of Fig. [1] in which ev- 
ery flea is addressed at every trial. Equation ([5]) and thus 
also the error estimate pO)) are not valid for correlated 
samples from a Markov chain. The correlation between 
samples is also responsible for the smooth shape of the 
results, which fools our intuition about the errors of the 
results. 

In the following we will discuss two methods for ob- 
taining reliable errors of a Monte Carlo simulation 



A. Error Estimates from Independent Simulations 

The easiest way of obtaining reliable error estimates is 
to create independent samples. To obtain them we per- 
form the simulation multiple times with different random 
seeds. In each simulation we record an estimate for P[n]. 
Then we obtain a final estimate for P[n\ by averaging the 
P[n\ obtained in the individual simulations and an error 
estimate by applying Eq. pO)) to the P[n] obtained from 
these independent simulations. 

We show results from performing L = 10 simulations 
oi M/L — 1000 measurements each in Fig. [31 While we 
still see large deviations, and the maximum appears too 
high as in Ref. [3, Q , the error bars are now much larger 
and appear correct — they include the correct value most 
of the time! 

While performing L independent simulations gives re- 
liable error bars we pay the cost that each of the L sim- 
ulations needs to be equilibrated independently, so that 
in our case we performed LM/5 = 20, 000 equilibration 
steps in addition to M = 10, 000 measurement steps. 




n 

FIG. 3: Comparison of the flea distribution P[n] obtained in 
Monte Carlo simulations of the original dog and fleas game 
with the exact asymptotic result. Now L = 10 independent 
simulations were performed for a total of M = 10, 000 mea- 
surements. Each simulation performing M/L measurements 
was equilibrated for M/5 steps. Now the error bars, estimated 
from the L = 10 independent simulations are much larger and 
agree with the exact result, but at the cost of having to equi- 
librate L simulations. 



B. Error Estimates from Uncorrelated Samples 

Another way of obtaining reliable errors is not to mea- 
sure after every flea hop, but to let many fleas hop before 
performing a measurement. In Fig. [Jwe show the results 
from a simulation performing A'hop = 99 flea hops 0] be- 
tween each of the M = 10, 000 measurements. 

Now the Monte Carlo results agree with the exact re- 
sults and the error bars are again much smaller, but at 
the cost of having to perform A^hop ~ 99 times more flea 
hops, and also losing all of the potentially useful infor- 
mation between measurements. In addition, we have no 
way of knowing whether A^hop = 99 hops between mea- 
surements are sufficient to create uncorrelated samples 
for which Eq. ([5]) holds, or whether a much smaller suf- 
fices or a much larger number is needed. 



C. Error Estimates for Correlated Samples 

While the above approach clearly demonstrates that 
indeed the correlations between samples Xi and 
the origin of our problems, it is not a viable solution. 
Instead let us correct the error estimate Eq. pH)) for the 
case of correlated samples by including the terms which 
we omitted above under the assumption of independence 
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FIG. 4: Comparison of the flea distribution P[n] obtained in 
Monte Carlo simulations of the original dog and fleas game 
with the exact asymptotic result. Now A^'hop ~ 99 flea hops 
were made between the M = 10, 000 measurements and M/5 
steps were used for equlibration. A''hop = 99 seem to be 
enough hops to decorrelate the samples and give reliable error 
estimates . 



line we used the definition of the integrated autocorrela- 
tion time of A: 



TA 



(A^) - {Ay 



(17) 



Inserting Eq. (|T6|) into Eq. (|15p we end up with the final 
error estimate 



and see that due to correlation effects 
creased by a factor of y/T 



(18) 



the error is in- 
2ta- Eq. p8|) . in fact, very 
nicely gives the effective number of uncorrelated samples 
as [M/ (1 + 2ta)] < M. While this explains the failures of 
the simple error estimate (jlO[) . it does not help us much 
yet since the estimation of ta via Eq. (|17p is expensive 
and cumbersome. A fast and easy way of estimating er- 
rors is explained below and an exact calculation of the 
autocorrelation time for this model is presented in the 
appendix. 



D. Error Estimates from a Binning Analysis 



^ to obtain: 



YavA 
M 



M 



(15) 



Previously we had assumed that, due to independence 
the second term is zero. Let us now replace the assuim 
tion of independence by a rapid decay as |i — j| — > (X) [K 
and rewrite the second term as 



^ E {{A,A,) - {Ay) 

M 



2 

M2 



{AY 



M M- 



M- 



-1 t= 

-1 



M 



E - {AY) 



M ^ 

t=i 



{{AiA 



i+t/ 



M 



(VarA)T^. 



(16) 



Going from the second to third line we relabeled the in- 
dices, in the next line we used the identical distributions 
to limit the sum over i to the first index, in the fifth line 
we extended the sum over t to infinity since the correla- 
tions are expected to decay fast enough, and in he last 



The binning analysis is a method of analyzing Monte 
Carlo data based on Eq. p^ . It provides both an esti- 
mate for the error Aa and for the integrated autocorre- 
lation time TA- 

Starting from the original series of measurements 



A?^ = A„ 



(19) 



we iteratively create "binned" series by averaging over 
two consecutive entries: 



A 



(i-i) 

2i~l 



A. 



(/-I) 

2i 



(20) 



fori = l,...,Mi = M/2'. 

Every entry in this new and shorter time series is the 
average of two adjacent values in the original one. The 
mean of the new binned time series is the same as the 
original time series. The averaged values are, however, 
less correlated than the original ones. The (incorrect) 
error estimates using the equation (|10p for uncorrelated 
samples gives errors 



A 



(0 



A 



Ain 



(21) 



that increase as a function of bin size 2'. These errors 
converge to the correct error estimate: 



= lim A 



(0 



(22) 



when the bins become uncorrelated for sizes 2^ ^ ta- 
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FIG. 5: Binning analysis of the error Apps] of the central 
value P[25] of the distribution, it is clearly seen that for 
M = 10, 000 samples the errors have not yet converged, while 
for M = 100, 000 samples convergence starts to be seen. At 
least M = 100, 000 samples have to be taken to get reliable 
results. 



This binning analysis thus gives a reliable recipe for 
estimating errors and autocorrelation times. One has to 
calculate the error estimates for different bin sizes I and 
check if they converge to a limiting value. If convergence 
is observed the limit Aa is a reliable error estimate, and 
TA can be obtained from equation (|18p as 



TA 



.(0) 



- 1 



(23) 



If however no convergence of the A^^ is observed we 
know that ta is longer than the simulation time and we 
have to perform much longer simulations to obtain reli- 
able error estimates. 

Let us redo one of the simulation of section IIII Bl and 
perform a binning analysis. In Fig. [6]we show our results 
for M — 100, 000 measurements calculating the errors 
using the binning analysis. Now everything is in order! 

It is worth noting that the autocorrelation time de- 
pends on the variable being sampled. For example, cal- 
culating this quantity for the number n of fleas yields 
24.0, which is larger than the value obtained from Fig. [5] 
for the peak of the histogram. 

To implement the binning analysis it is not neces- 
sary to store the full time series. Instead memory of 
21og2Af numbers is sufficient. Interested reader are en- 
couraged to look at the implementation in the source file 
src/alps/alea/simplebinning.hof the ALPS libraries 

El. 
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FIG. 6: Comparison of the flea distribution P[n] obtained in 
Monte Carlo simulations of the original dog and fleas game 
with the exact asymptotic result. This time M = 100, 000 cor- 
related measurements were taken, and the errors calculated 
using a binning analysis: all is fine! 



V. CONCLUSIONS: LESSONS LEARNED 

In the discussion of the dogs and fleas simulation we 
have seen some of the subtleties and pitfalls in estimat- 
ing reliable errors for results of Monte Carlo simulations. 
Correlation effects make it necessary to perform a bin- 
ning analysis instead of using the simple Eq. PH)) which 
is valid only for independent samples. 

We have not touched on the issue of cross-correlations 
between different quantities, that influence error esti- 
mates of e.g. the specific heat Cy = {{E"^) — {E)^)/kBT^- 
To calculate such errors, a bootstrap or jackknife method 
is required in addition to a binning analysis. 

An important lesson learned is that a reliable analy- 
sis of errors of a simulation can be much harder than 
performing the simulation, but is an essential part for 
any numerical project. We could have drawn incorrect 
conclusions and conjectured a new physical phenomenon 
based on our too small error bars! 

We have also seen that using improved methods, such 
as the direct sampling of the distribution in Fig[l] smaller 
errors and more reliable results can be obtained. Unfor- 
tunately direct sampling is impossible in all but the sim- 
plest models — but still improved algorithms are the key 
to reliable large scale simulations. It is interesting that 
over the past three decades progress in algorithms for the 
simulation of the Ising model has outperformed Moore's 
law: running modern algorithms on 30 year old comput- 
ers would be faster than running 30 year old algorithms 
on the fastest supercomputers of today p^ ! 

All of the programs used to produce the data in this 
paper are included in the example/sampling directory 
of the latest release of the ALPS libraries ^lli] . 



7 



Acknowledgements 

VA thanks his colleague Erich Mueller for suggest- 
ing the generating function method used in the Ap- 
pendix, and Cornell graduate students Frank Petruzielo 
and Bryan Daniels for C++ instruction. MT acknowl- 
edges support of the Aspen Center for Physics. 



APPENDIX 

The evolution of the number of fleas on Anik in our 
Monte Carlo simulation is done probabilistically using a 
"Markov process." Let Pi[n],n — 0, 1, . . . N, be the i-th 
update of the probability of n fleas on Anik. Then 



N - 



N 

N 



-P^[n-l] 



N 



-P^[n + l] 



(24) 



n'=0 



where the coefficients are the relative probabilities for a 
flea to hop on or off, written in the last line in terms of a 
{N + 1) X (A'' + 1) tridiagonal matrix M with the entries 
N, N- 1, AT- 2, ... 2, 1 on the sub-diagonal, 1, 2, 3 ... A- 
1 , A^ on the superdiagonal, and zeros elsewhere 



M = 



A^ 





Vo 



1 



A- 1 







A^- 1 







N 
0/ 



(25) 



The A^ + 1 eigenvalues A and right eigenvectors r[n] of 
this matrix can be obtained from a generating function 



N 
n=0 



(26) 



When used with the eigenvalue equation 
J2n' ^{''^^^'VW] — ^''N' / is seen to obey the 
differential equation 



ov ou 



(27) 



whose solution of the required form f{u,v) — v h{u/v) 
is 



(28) 



where K\ is independent of u and v. The series 
in u and v must terminate, requiring the two expo- 
nents in Eq. (pS)) to be non-negative integers, and 
thus implying that the eigenvalues are ±A^. ± (A^ — 
2), . . . , ±1 (for odd A^) or (for even N). 



The left (dual) eigenvectors l[n] are generated by 

N 
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{u, v) ~ u"ti 



N-r 



N 



(29) 



the coefficient being the combinatorial coefficient defined 
in Eq. (fTT|) . When used with the eigenvalue equation 
l[n']M[n' , n] = AZ[n], g is seen to obey the identical 
differential equation as /, namely Eq. ([^5]) . The con- 
stants in the solution Eq. (|28[) determine normalization. 
One choice is to take K\ — ^ for fx and 



K ( ^ 



(30) 



for g\, whereupon rjv[n] is given by PeqM, Eq- (HUj the 
stationary normalized solution of Eq. ((24)) , and In [n] ~ 1 
for every n. That this choice also achieves the complete- 
ness relation for orthonormal eigenvectors. 



'^rx[n]lx[n'] = 5„ 



(31) 



can be seen from Eqs. ([21]), dM]), and (1^ . 

These considerations facilitate analysis of the approach 
to equilibrium. The initial condition of fiealess Anik may 
be written using Eq. (PTjl as 



(32) 



whereupon t steps of the evolution Eq. ((24l) yield 



= E^^W^^M = E {^)'rx[n\lxm. (33) 



A 



A 



A^^ 



The moments of this evolved distribution may now be 
calculated. Comparing partial derivatives with respect 
to u of Eqs. (gni) and (US]) one finds 



A^, 



^ n rx[n] = ^(5a,w - t;^\n-2 (34) 



and 



n{n - 1) rx[n] = 



N{N-l) (A--1) 



(35) 



Since Ix [0] can be seen to be equal to the Kx of Eq. (|30p , 
one deduces that 

J2nP,[n]^Nf,{t) = [!-(!- )*] 



N , 



2 ^ ^ 



(36) 



showing that the mean number of fieas approaches equal 
partitioning exponentially with an equilibration time 
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N/2. The decay as (1 — ■^)* = A2 is actually a general 
result: in any Markov process the equilibration is con- 
trolled asymptotically by the second largest eigenvalue 
A2. 

In a similar way, it is seen using Eqs. p4|) and pS]) 
that the mean square fluctuation of the number at time 
step t is given, within the exponential approximation of 
the last line of Eq. by 



E 



(n - ^i{t)f Pt[n] = NiJi{t){l ~ ^l{t)). (37) 



This shows, interestingly, that the relation between the 
mean and width of a binomial distribution for the proba- 
bilities associated with tossing a biased coin is preserved 
during stages of the evolution long before equilibrium is 
reached. 

These methods also permit the exact calculation of the 
integrated autocorrelation time r^, defined in Eqs. (dH 
I17p . for this simple model. As an example, we consider 
the number n of fleas on Anik and calculate the corre- 
sponding autocorrelation time r„. We need to calculate 
the average {n'n) , where n' is the number of fleas a given 
number of hops later than an n-flea state. For t hops, 
this average is 



,M^[n',n] 



nPeq [n] , 



(38) 



where M is given in Eq. (|25p and P^q is the equilibrium 
distribution of Eq. pTjl . In Eq. (1551) ; picked at ran- 
dom from the known correct distribution and n' is corre- 
lated with n via the conditional probability for t hops. 

Now M can be represented in terms of its eigenvalues 
and eigenvectors as 



M[n',n]^Y.^x[n']\lx[nl 



(39) 



and it folows, using the orthonormality relation 
Z)„^aM rx'[n] = (5a, A', that 

= E''^ [" Ktf) (40) 



The contribution of the highest eigenvalue A = iV to 
this sum, obtained from the eigenvectors given above 
Eq. (|3ip . is found to be independent of n and equal to 



Pggi'T-'] for any t. This convenient fact leads to the iden- 
tity 



^-^e,[n']=5:rAK](A)*^,H- (41) 



When the right hand side of this form is substituted into 
Eq. ([55]) one encounters the average given in Eq. (|34p . 
and also the average 

Y,nP,,[n] IM = ( A) [^hM~\h,N-2l (42) 

n \ 2 / 

which has been evaluated via a partial derivative with 
respect to u of the generating function g in Eq. (|28p . 
Thus only the eigenvalue A = — 2 contributes to the 
happily simple result 



ElC^t - [n?] = E E^'^^^^Y^^ h[n]nP,M 



N \ 1-2/A^ 1 , 
A^-iy 2/N ^2' 
N{N ^2) 



(43) 



Since the equilibrium variance of n is A'^/4, we see by com- 
parison with Eq. (fT7|) that the integrated autocorrelation 
time for sampling A^ fleas one at a time is 



Tn{N) 



N-2 



so that M single hops are equivalent to only 



Meff = 



M 



M 



2r„ -t- 1 N -I 



(44) 



(45) 



trials of N-flea configurations. In our simulation of 
A^ — 50 fleas we determined t„ = 24.0 using the bin- 
ning analysis, in perfect agreement with the prediction 
T„ = (iV-2)/2 = 24. 

It is gratifying that the above learned considerations 
show that randomizing the number of 'heads' among 
A^ coins by arbitrarily choosing and turning over 1 is 
1/{N — 1) times as effective as tossing all A^ at once. As 
mentioned in the Introduction, it is the less efficient pro- 
cesses which are typically at work in physical situations. 
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